Empar: EM-based algorithm for parameter estimation of Markov models on trees
نویسندگان
چکیده
The goal of branch length estimation in phylogenetic inference is to estimate the divergence time between a set of sequences based on compositional differences between them. A number of software is currently available facilitating branch lengths estimation for homogeneous and stationary evolutionary models. Homogeneity of the evolutionary process imposes fixed rates of evolution throughout the tree. In complex data problems this assumption is likely to put the results of the analyses in question. In this work we propose an algorithm for parameter and branch lengths inference in the discrete-time Markov processes on trees. This broad class of nonhomogeneous models comprises the general Markov model and all its submodels, including both stationary and nonstationary models. Here, we adapted the well-known Expectation-Maximization algorithm and present a detailed performance study of this approach for a selection of nonhomogeneous evolutionary models. We conducted an extensive performance assessment on multiple sequence alignments simulated under a variety of settings. We demonstrated high accuracy of the tool in parameter estimation and branch lengths recovery, proving the method to be a valuable tool for phylogenetic inference in real life problems. Empar is an open-source C++ implementation of the methods introduced in this paper and is the first tool designed to handle nonhomogeneous data.
منابع مشابه
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
We describe the maximum-likelihood parameter estimation problem and how the ExpectationMaximization (EM) algorithm can be used for its solution. We first describe the abstract form of the EM algorithm as it is often given in the literature. We then develop the EM parameter estimation procedure for two applications: 1) finding the parameters of a mixture of Gaussian densities, and 2) finding the...
متن کاملTime discretization of continuous-time filters and smoothers for HMM parameter estimation
In this paper we propose algorithms for parameter estimation of fast-sampled homogeneous Markov chains observed in white Gaussian noise. Our algorithms are obtained by the robust discretization of stochastic differential equations involved in the estimation of continuous-time Hidden Markov Models (HMM’s) via the EM algorithm. We present two algorithms: The first is based on the robust discretiz...
متن کاملNoise Compensation by a Sequential Kullback Proximal Algorithm
We present sequential parameter estimation in the framework of the Hidden Markov Models. The sequential algorithm is a sequential Kullback proximal algorithm, which chooses the KullbackLiebler divergence as a penalty function for the maximum likelihood estimation. The scheme is implemented as £lters. In contrast to algorithms based on the sequential EM algorithm, the algorithm has faster conver...
متن کاملAn Adaptive Approach to Increase Accuracy of Forward Algorithm for Solving Evaluation Problems on Unstable Statistical Data Set
Nowadays, Hidden Markov models are extensively utilized for modeling stochastic processes. These models help researchers establish and implement the desired theoretical foundations using Markov algorithms such as Forward one. however, Using Stability hypothesis and the mean statistic for determining the values of Markov functions on unstable statistical data set has led to a significant reducti...
متن کاملEstimating rate constants in hidden Markov models by the EM algorithm
The EM algorithm, e.g., the Baum–Welch re-estimation, is an important tool for parameter estimation in discrete-time hidden Markov models. We present a direct re-estimation of rate constants for applications in which the underlying Markov process is continuous in time. Previous estimation of discrete-time transition probabilities is not necessary.
متن کامل